Combination of Symbolic and Statistical Approaches for Grammatical Knowledge Acquisition

نویسندگان

  • Masaki Kiyono
  • Jun'ichi Tsujii
چکیده

The framework we adopted for customizing linguistic knowledge to individual application domains is an integration of symbolic and statistical approaches. In order to acquire domain specific knowledge, we have previously proposed a rule-based mechanism to hypothesize missing knowledge from partial parsing results of unsuccessfully parsed sentences. In this paper, we focus on the statistical process which selects plausible knowledge from a set of hypotheses generated from the whole corpus. In particular, we introduce two statistical measures of hypotheses, Local Plausibility and Global Plausibility, and describe how these measures are determined iteratively. The proposed method will be incorporated into the tool kit for linguistic knowledge acquisition which we are now developing. 1 I n t r o d u c t i o n Current technologies in natural language processing are not so mature as to make general purpose systems applicable to any domains; therefore rapid customization of linguistic knowledge to the sublanguage of an application domain is vital for the development of practical systems. In the currently working systems, such customization has been carried out manually by linguists or lexicographers with time-consuming effort. We have already proposed a mechanism which acquires sublanguage-specific linguistic knowledge from parsing failures and which can be used as a tool for linguistic knowledge customization (Kiyono and Tsujii, 1993; Kiyono and Tsujii, 1994). Our approach is characterized by a mixture of symbolic and statistical approaches to grammatical knowledge acquisition. Unlike probabilistic parsing, proposed by (Fujisaki et al., 1989; Briscoe and Carroll, 1993), *also a staff member of Matsushita Electric Industrial Co.,Ltd., Shinagawa, Tokyo, JAPAN. 72 which assumes the prior existence of comprehensive linguistic knowledge, our system can suggest new pieces of knowledge including CFG rules, subcategorization frames, and other lexical features. It also differs from previous proposals on lexical acquisition using statistical measures such as (Church et al., 1991; Brent, 1991; Brown et al., 1993) which either deny the prior existence of linguistic knowledge or use linguistic knowledge in ad hoc ways. Our system consists of two components: (1) the rule-based component, which detects incompleteness of the existing knowledge and generates a set of hypotheses of new knowledge and (2) the corpus-based component which selects plausible hypotheses on the basis of their statistical behaviour. As the rule-based component has been explained in our previous papers, in this paper we focus on the corpus-based component. After giving a brief explanation of the framework, we describe a data structure called Hypothesis Graph which plays a crucial role in the corpus-based process, and then introduce two statistical measures of hypotheses, Global Plausibility and Local Plausibility, which are iteratively determined to select a set of plausible hypotheses. An experiment which shows the effectiveness of our method is also given. 2 T h e S y s t e m O r g a n i z a t i o n 2.1 Hypothesis Generation Figure 1 shows the framework of our system. When the parser fails to analyse a sentence, the Hypothesis Generator (HG) produces hypotheses of missing knowledge each of which could rectify the defects of the current grammar. As the parser is a sort of Chart Parser and maintains partial parsing results in the form of inactive and active edges, a parsing failure means that no inactive edge of category S spanning the whole sentence exists. The HG tries to introduce an inactive edge of S by making hypotheses of missing linguistic knowledge. It generates hypotheses of rewriting rules which collect existing sequences of inactive edges into an expected category. It also calls itself recursively to in-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Software Warehouse Management

This paper proposes a knowledge-based approach to manage software warehouses. It is understood that knowledge acquisition is the bottleneck for intelligent systems of all kinds. Our research focuses on solutions for both theoretical and practical aspects of the bottleneck tasks through the proposed mechanisms of randomization, symbolic representation, and grammatical inference. Key-Words: Knowl...

متن کامل

Acquisition of English anaphora by Iranian EFL learners

The present study examined the acquisition of anaphora in English by Iranian EFL learners as well as Persian speaking children. To do so, the study was conducted in three phases. In the first phase, 40 intermediate female and male EFL learners were selected from Puyan Institute in Takestan, Iran. Then, an off-line based Grammatical Judgment Task was administered. In the second phase, 40 female ...

متن کامل

A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining

Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...

متن کامل

Learning Grammatical Constructions

We describe a computational model of the acquisition of early grammatical constructions that exploits two essential features of the human language learner: significant prior knowledge of concepts and individual lexical items, and sensitivity to the statistical properties of the input data. Such principles, previously applied to lexical acquisition, are shown to be useful and necessary for learn...

متن کامل

Level of Grammatical Proficiency and Acquisition of Functional Projections: The case of Iranian learners of English language

Unlike Lexical Projections, Functional Projections (Extended Projections) are more of an ‘abstract’ in nature. Therefore, Functional Projections seem to be acquired later than Lexical Projections by the L2 learners. The present study investigates Iranian L2 learners’ acquisition of English Extended Projections taking into account their level of grammatical proficiency. Specifically, the aim is ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994